FunSparse
=========================

.. _funsparse-label:

Method Description
------------------
Sparse Functional K-means Clustering is an advanced clustering technique designed for functional data,
such as time series or curves. It extends traditional K-means clustering by incorporating sparsity to select the most relevant features of the data for clustering, enhancing interpretability and accuracy.

- Functional Data Representation

Each observed curve is represented in a continuous form, ensuring the data are suitable for clustering.

- Sparsity Constraint

A sparsity constraint is introduced to select the most relevant parts of the domain, controlled by a parameter m that specifies the measure of the domain where the weighting function is zero.

- Optimization Problem

The clustering problem is formulated as a variational problem, maximizing the weighted between-cluster sum of squares (BCSS) subject to the sparsity constraint.

- Iterative Algorithm

An iterative algorithm is used to solve the optimization problem. The algorithm alternates between:

    1. **Weighting Function Update**: Given the current clustering, the optimal weighting function is computed using the solution to the variational problem. This step identifies the most relevant parts of the domain for clustering.
    2. **Clustering Update**: Given the weighting function, the optimal clustering is found by applying a functional K-means clustering algorithm, where the distance between functions is weighted according to the weighting function.

- Parameter Tuning

The sparsity parameter m is tuned using a permutation-based GAP statistics approach to determine the optimal level of sparsity.

- Visualization and Interpretation

The results are visualized through estimated cluster mean functions and the weighting function, highlighting the most discriminative parts of the domain.

Function
--------------
This method provides three core functions: **sparse_sim_data**, **sparse_bifunc** and **FDPlot.sparse_fdplot**.
In this section, we detail their respective usage, as well as parameters, output values and usage examples for each function. 

sparse_sim_data
~~~~~~~~~~~~~~~~~
**sparse_sim_data** generates simulated data according to the FunSparse model, and a true clustering result.

.. code-block:: python

    sparse_sim_data(n, x, paramC, plot = False)

Parameter
^^^^^^^^^^

.. list-table:: 
   :widths: 30 70
   :header-rows: 1
   :align: center

   * - Parameter
     - Description
   * - **n**
     - integer, the number of observations or curves to generate for each cluster.
   * - **x**
     - array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.
   * - **paramC**
     - numeric, a parameter that controls the overlap between the two clusters. It specifies the proportion of the domain where the mean functions of the two clusters are identical.
   * - **plot**
     - bool, whether to plot the generated data. Default is False.


Value
^^^^^^^^^
The function **sparse_sim_data** outputs a matrix contains simulated data, and a sequence contains true clustering result.

- **data**: array, a simulated data generated by FunSparse model.

- **cluster**: array, true clustering results.

If **plot=True**, a visualization of simulated data will be displayed.

.. image:: /_static/sparse_data.png
   :width: 400
   :align: center

Example
^^^^^^^^
.. code-block:: python

    import numpy as np
    from BiFuncLib.simulation_data import sparse_sim_data
    paramC = 0.7
    n = 100
    x = np.linspace(0, 1, 1000)
    sparse_simdata = sparse_sim_data(n, x, paramC)['data']


sparse_bifunc
~~~~~~~~~~~~~~~
**sparse_bifunc** performs model fitting.

.. code-block:: python

  sparse_bifunc(data, x, K, method = 'kmea', true_clus = None)


Parameter
^^^^^^^^^^

.. list-table:: 
   :widths: 30 70
   :header-rows: 1
   :align: center

   * - Parameter
     - Description
   * - **data**
     - array, the functional data to be clustered. This is typically a 2D array where each column represents a functional observation.
   * - **x**
     - array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.
   * - **k**
     - integer, the number of clusters to form.
   * - **method**
     - str, the clustering method to use, 'hier' or 'kmea'. Default is 'kmea'.
   * - **true_clus**
     - numeric or None, The true cluster labels. If known, the Classification Error Rate (CER) will be calculated to evaluate the clustering performance.

Value
^^^^^^^^^
The function **sparse_bifunc** outputs a dict including clustering results and the value of CER (if **true_clus=True**).

- **result**: dict, clustering results for FunSparse model, including:
    1. **cluster**: array, indicates the cluster assignment for each data point.
    2. **iteration**: integer, the number of iterations the algorithm has run.
    3. **obj**: numeric, the value of the objective function.
    4. **w**: array, weighting function used in sparse clustering.

- **CER**: numeric, the Classification Error Rate (CER).


Example
^^^^^^^^
.. code-block:: python

    import numpy as np
    from BiFuncLib.simulation_data import sparse_sim_data
    from BiFuncLib.sparse_bifunc import sparse_bifunc
    K = 2
    paramC = 0.7
    n = 100
    x = np.linspace(0, 1, 1000)
    sparse_simdata = sparse_sim_data(n, x, paramC)['data']
    part_vera = sparse_sim_data(n, x, paramC)['cluster']
    sparse_res = sparse_bifunc(sparse_simdata, x, K, true_clus = part_vera)


FDPlot.sparse_fdplot
~~~~~~~~~~~~~~~~~~~~~~~~
**FDPlot.sparse_fdplot** visualizes the result generated by **pf_bifunc** function.

.. code-block:: python

    FDPlot(result).sparse_fdplot(x, data)


Parameter
^^^^^^^^^^
.. list-table:: 
   :widths: 30 70
   :header-rows: 1
   :align: center

   * - Parameter
     - Description
   * - **result**
     - dict, a clustering result generated by **sparse_bifunc** function.
   * - **x**
     - array, the domain over which the functional data is defined. This is typically a set of points (e.g., time points or spatial coordinates) where the functional data is observed.
   * - **data**
     - array, the functional data to be clustered. This is typically a 2D array where each column represents a functional observation.


Value
^^^^^^^^^
The function outputs two graphs.

The first image represents the outcomes of applying sparse functional K-means clustering to functional datasets,
with each curve corresponding to a data point and different colors indicating distinct clusters.

.. image:: /_static/sparse_cluster.png
   :width: 400
   :align: center

The second image illustrates the weighting function resulting from the sparse clustering algorithm,
highlighting the critical portions of the data domain that are essential for cluster differentiation.

.. image:: /_static/sparse_weighting.png
   :width: 400
   :align: center


Example
^^^^^^^^
.. code-block:: python

    import numpy as np
    from BiFuncLib.simulation_data import sparse_sim_data
    from BiFuncLib.sparse_bifunc import sparse_bifunc
    from BiFuncLib.FDPlot import FDPlot
    K = 2
    paramC = 0.7
    n = 100
    x = np.linspace(0, 1, 1000)
    sparse_simdata = sparse_sim_data(n, x, paramC)['data']
    part_vera = sparse_sim_data(n, x, paramC)['cluster']
    sparse_res = sparse_bifunc(sparse_simdata, x, K, true_clus = part_vera)
    FDPlot(sparse_res).sparse_fdplot(x, sparse_simdata)